Algorithms for testing fault-tolerance of sequenced jobs

نویسندگان

  • Marek Chrobak
  • Mathilde Hurand
  • Jirí Sgall
چکیده

We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately. First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k. Then we consider the model in which any two faults are separated in time by a gap of length at least ∆, where ∆ is at least twice the maximum job length. For exposed faults we give an O(n)-time algorithm. For hidden faults we give an algorithm with running time O(n), and we prove that if job lengths are distributed uniformly over an interval [0, pmax], then this algorithm’s expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an Ω(n) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...

متن کامل

Towards high-available and energy-efficient virtual computing environments in the cloud

Empowered by virtualisation technology, cloud infrastructures enable the construction of flexible and elastic computing environments, providing an opportunity for energy and resource cost optimisation while enhancing system availability and achieving high performance. A crucial requirement for effective consolidation is the ability to efficiently utilise system resources for highavailability co...

متن کامل

Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

BACKGROUND Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively...

متن کامل

Dual-Mode r-Reliable Task Model for Flexible Scheduling in Reliable Real-Time Systems

Recent research in real-time systems has much focused on new task models for flexible scheduling and fault-tolerant real-time systems. In this paper, we propose a novel task model for the purpose of flexible scheduling in reliable real-time systems. In the proposed dualmode r-reliable task model, a task periodically releases fast mode jobs or reliable mode jobs with the constraint that reliable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Scheduling

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2009